New stopping criteria for segmenting DNA sequences.
نویسنده
چکیده
We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S. cerevisiae and the complete sequence of E. coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.
منابع مشابه
Comparing different stopping criteria for fuzzy decision tree induction through IDFID3
Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...
متن کاملOptimal Stopping Policy for Multivariate Sequences a Generalized Best Choice Problem
In the classical versions of “Best Choice Problem”, the sequence of offers is a random sample from a single known distribution. We present an extension of this problem in which the sequential offers are random variables but from multiple independent distributions. Each distribution function represents a class of investment or offers. Offers appear without any specified order. The objective is...
متن کاملAn Introduction to a New Criterion Proposed for Stopping GA Optimization Process of a Laminated Composite Plate
Several traditional stopping criteria in Genetic Algorithms (GAs) are applied to the optimization process of a typical laminated composite plate. The results show that neither of the criteria of the type of statistical parameters, nor those of the kinds of theoretical models performs satisfactorily in determining the interruption point for the GA process. Here, considering the configuration of ...
متن کاملA PRACTICAL APPROACH TO REAL-TIME DYNAMIC BACKGROUND GENERATION BASED ON A TEMPORAL MEDIAN FILTER
In many computer vision applications, segmenting and extraction of moving objects in video sequences is an essential task. Background subtraction, by which each input image is subtracted from the reference image, has often been used for this purpose. In this paper, we offer a novel background-subtraction technique for real-time dynamic background generation using color images that are taken fro...
متن کاملSegmenting DNA sequence into 'words' based on statistical language model
[Abstract] This paper presents a novel method to segment/decode DNA sequences based on n-gram statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. The bound of language entropy of DNA sequence is about 1.5674 bits. After building an n-gram biology languages model, we design an unsupervised ‘probability approach...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Physical review letters
دوره 86 25 شماره
صفحات -
تاریخ انتشار 2001